graph processing
GPML: Graph Processing for Machine Learning
Jaber, Majed, Michel, Julien, Boutry, Nicolas, Parrend, Pierre
The dramatic increase of complex, multi-step, and rapidly evolving attacks in dynamic networks involves advanced cyber-threat detectors. The GPML (Graph Processing for Machine Learning) library addresses this need by transforming raw network traffic traces into graph representations, enabling advanced insights into network behaviors. The library provides tools to detect anomalies in interaction and community shifts in dynamic networks. GPML supports community and spectral metrics extraction, enhancing both real-time detection and historical forensics analysis. This library supports modern cybersecurity challenges with a robust, graph-based approach.
The Evolution of Distributed Systems for Graph Neural Networks and their Origin in Graph Processing and Deep Learning: A Survey
Vatter, Jana, Mayer, Ruben, Jacobsen, Hans-Arno
Graph Neural Networks (GNNs) are an emerging research field. This specialized Deep Neural Network (DNN) architecture is capable of processing graph structured data and bridges the gap between graph processing and Deep Learning (DL). As graphs are everywhere, GNNs can be applied to various domains including recommendation systems, computer vision, natural language processing, biology and chemistry. With the rapid growing size of real world graphs, the need for efficient and scalable GNN training solutions has come. Consequently, many works proposing GNN systems have emerged throughout the past few years. However, there is an acute lack of overview, categorization and comparison of such systems. We aim to fill this gap by summarizing and categorizing important methods and techniques for large-scale GNN solutions. In addition, we establish connections between GNN systems, graph processing systems and DL systems.
Think like a vertex: using Go's concurrency for graph computation
Why do you do machine learning in Go? Of course, the main reason is that I like the language. But there are other, more generic, reasons. In the fifth episode of the third season of Command Line Heroes, Saron Yitbarek exposes the fact that Go's design is tidily linked to the cloud infrastructure. Indeed, the concurrency mechanism makes it super easy to write a program the can run at scale on inexpensive machines.
Nvidia Rapids cuGraph: Making graph analysis ubiquitous ZDNet
A new open-source library by Nvidia could be the secret ingredient to advancing analytics and making graph databases faster. Nvidia has long ago stopped being "just" a hardware company. As its hardware is what much of the compute supporting the explosion in AI runs on, Nvidia has taken upon itself the task of paving the last mile to the software. Nvidia does this by developing and releasing libraries that software developers and data scientists can use to integrate GPU power in their work. The premise is simple: Not everyone is a specialist in parallelism or wants to be one.
5 Reasons to Become an Apache Spark Expert - The Databricks Blog
Apache Spark has fast become the most popular unified analytics engine for big data and machine learning. It was originally developed at UC Berkeley in 2009 by the team who later founded Databricks. Since its release, Apache Spark has seen rapid adoption. Today's most cutting-edge companies such as Apple, Netflix, Facebook, and Uber have deployed Spark at massive scale, processing petabytes of data to deliver innovations -- from detecting fraudulent behavior to delivering personalized experiences in real-time -- that are transforming every industry. Behind these groundbreaking innovations are a small, but fast growing group of talented engineers, developers, and data scientists with deep knowledge of Apache Spark.
Learning Path: Spark: Data Science with Apache Spark
Every year a large amount of data is generated which needs to be stored and analyzed. Apache Spark allows you to process such big data. The real power and value proposition of Apache Spark is its speed and platform to execute data science tasks. Spark's unique use case is that it combines ETL, batch analytic, real-time stream analysis, machine learning, graph processing, and visualizations to allow data scientists to tackle the complexities that come with raw unstructured data sets. Spark embraces this approach and has the vision to make the transition from working on a single machine to working on a cluster, something that makes data science tasks a lot more agile.
Data Science with Spark - Udemy
The real power and value proposition of Apache Spark is its speed and platform to execute Data Science tasks. Spark's unique use case is that it combines ETL, batch analytic, real-time stream analysis, machine learning, graph processing, and visualizations to allow Data Scientists to tackle the complexities that come with raw unstructured data sets. Spark embraces this approach and has the vision to make the transition from working on a single machine to working on a cluster, something that makes data science tasks a lot more agile. In this course, you'll get a hands-on technical resource that will enable you to become comfortable and confident working with Spark for Data Science. We won't just explore Spark's Data Science libraries, we'll dive deeper and expand on the topics.
Graph-Structured Representations for Visual Question Answering
Teney, Damien, Liu, Lingqiao, Hengel, Anton van den
This paper proposes to improve visual question answering (VQA) with structured representations of both scene contents and questions. A key challenge in VQA is to require joint reasoning over the visual and text domains. The predominant CNN/LSTM-based approach to VQA is limited by monolithic vector representations that largely ignore structure in the scene and in the form of the question. CNN feature vectors cannot effectively capture situations as simple as multiple object instances, and LSTMs process questions as series of words, which does not reflect the true complexity of language structure. We instead propose to build graphs over the scene objects and over the question words, and we describe a deep neural network that exploits the structure in these representations. This shows significant benefit over the sequential processing of LSTMs. The overall efficacy of our approach is demonstrated by significant improvements over the state-of-the-art, from 71.2% to 74.4% in accuracy on the "abstract scenes" multiple-choice benchmark, and from 34.7% to 39.1% in accuracy over pairs of "balanced" scenes, i.e. images with fine-grained differences and opposite yes/no answers to a same question.